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Abstract. - The heavy-tailed inter-event time distributions are widely observed in many human- 
activated systems, which may result from both endogenous mechanisms like the highest-priority- 
first protocol and exogenous factors like the varying global activity versus time. To distinguish the 
effects on temporal statistics from different mechanisms is this of theoretical significance. In this 
Letter, we propose a new timing method by using a relative clock, where the time length between 
two consecutive events of an individual is counted as the number of other individuals' events 
appeared during this interval. We propose a model, in which agents act either in a constant rate 
or with a power-law inter-event time distribution, and the global activity either keeps unchanged or 
varies periodically versus time. Our analysis shows that the heavy tails caused by the heterogeneity 
of global activity can be eliminated by setting the relative clock, yet the heterogeneity due to real 
individual behaviors still exists. We perform extensive experiments on four large-scale systems, the 
search engine by AOL, a social bookmarking system-Delicious, a short-message communication 
network, and a microblogging system- Trotter. Strong heterogeneity and clear seasonality of global 
activity are observed, but the heavy tails cannot be eliminated by using the relative clock. Our 
results suggest the existence of endogenous heterogeneity of human dynamics. 
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Introduction. — Characterizing and understanding 
human activity patterns are necessary to explain many 
socioeconomic phenomena and could find significant ap- 
plications ranging from resource allocation to transporta- 
tion control, from epidemic prediction to interface design 
for Internet users [UH]- One of the most attractive ob- 
servations is the heavy-tailed nature of human temporal 
activities, with the inter-event time distribution usually 
being approximate to a power-law form. Example include 
the email communication [3] , the surface mail communica- 
tion [HE], the cell-phone communication [BHS], the online 
activities 9 -12], and so on, to name just a few. 



Many endogenous mechanisms of human activities have 
been put forward to explain the observed heavy-tailed 
statistics, such as the task priority [3][T3], the varying in- 
terest [I3J[15], the memory effects [16], the human inter- 



acting [17rfl9] . and so on. Besides the efforts on digging 
out endogenous mechanisms, a litter pessimistic argument 
is that the observed heavy-tailed statistics are hardly to 
reveal significant ingredients or provide insights on hu- 
man activity patterns yet may originate from some trivial 
exogenous factor^). In particular, the heterogeneity and 
seasonalitjH of human activities has recently been recog- 
nized as one candidate to explain the heavy-tailed inter- 
event time distribution [20l[2T] . Putting the mathematics 



Here we use the word "exogenous" to stand for the factors not 
related to the essential motivations or stimulations from the actions 
or other people. 

2 Denote by M(T) the global activity of the population (i.e., the 
number of events during the T's time window), the heterogeneity lies 
in the heterogeneity of the distribution of M, and the seasonality 
is evidenced by M(T) ss M(T + P), where P is the time period, 
normally being a day and/or a week in our daily behaviors. 
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Fig. 1: An illustration about the absolute clock and relative 
clock for the definitions of inter-event time. A, B and C refer 
to three different individuals and the vertical lines stand for 
actions. 



behind, the basic idea is simple: Take the short-message 
communication as an example, an individual usually does 
not send messages during the sleeping time, which forms 
large time intervals, and compared with frequent commu- 
nications in the day time, these intervals spanning across 
the midnight contribute to the heavy tails. Accordingly, 
the observed statistical regularities may result from hy- 
brid mechanisms [22]-some of them are endogenous like 
the highest-priority-first rule [3], while others are exoge- 
nous like activity heterogeneity [50] and seasonality |21) . 

In this Letter, we propose a new timing method that 
can eliminate the heavy tails in the inter-event time dis- 
tribution caused by the activity heterogeneity. We analyze 
a model, in which agents act with either an exponential or 
a power-law inter-event time distribution, and the global 
activity either keeps unchanged or varies periodically ver- 
sus time. Simulation results show that the heavy tails 
caused by the heterogeneity of activity can be eliminated 
by setting the relative clock, yet the heterogeneity due to 
endogenous individual behaviors still exists. Comparing 
the modeling results to the experiments on four large-scale 
real systems, we conclude that the temporal activity con- 
tains endogenous heterogeneity that cannot be explained 
by Poissonian agent assumption with seasonality. 

Relative Clock. — The heterogeneity of human ac- 
tivity versus time has been observed for many online sys- 
tems. For example, we will later show four real systems 
in Fig. 5. As we have mentioned above, the statistics 
about inter-event time at the population level may result 
from hybrid mechanisms, and thus it is valued to design a 
method that can filter out the effects caused by the exoge- 
nous heterogeneity. In the traditional way, the inter-event 
time is defined as the time interval between two consecu- 
tive events. Figure 1 illustrates a simple example where 
the individual A acts at time ol\ = 2, a 2 = 5, = 58, 
«4 = 85 and as = 96, and thus the four time intervals 
are a% — a±, 03 — ct2, 04 — and as — This timing 
method is called absolute clock in this Letter. Considering 
a system with strong heterogeneity of human activity ver- 
sus time. For example, in a short-message communication 
network, an individual may send in average more than 



Table 1: Inter-event times for individual A in figure 1. The 
upper row corresponds to the results based on the absolute 
clock while the lower row on the relative clock. In the case 
of relative clock, we use the number plus one to avoid zero 
interval. 



("1,0:2) 


("2,0-3) 


(03,04) 


(04,05) 


Absolute 3 


53 


27 


11 


Relative 1 


10 


3 


3 



ten messages in the noon yet less than one message dur- 
ing the midnight. As a time interval, lh is relatively long 
in the noon yet lOh is usual across the midnight. There- 
fore, the absolute clock is highly affected by the activity 
heterogeneity and thus may fail to capture the endoge- 
nous human activity patterns. Accordingly, we propose a 
new timing method by using a relative clock, where the 
time length between two consecutive events of an individ- 
ual is counted as the number of other individuals' events 
appeared during this interval. Considering the popula- 
tion A, B and C shown in Fig. 1, the inter-event time 
of the events happened at a 2 and 03 for individual A is 
counted as the number of events in between a 2 and 03 for 
individuals B and C . Table 1 presents the results of two 
definitions of the inter-event time for individual A. Com- 
pared with the absolute clock, the relative clock, running- 
faster at the time with frequent events, can be considered 
as a kind of time rescaling method that can eliminate the 
heavy tails of inter-event time distribution caused by the 
activity heterogeneity. 

Model. — To see the difference between absolute 
and relative clocks, we first study a theoretical model. 
This model spans over 10 days, with a second resolution, 
namely it contains 864000 time steps. Each day is di- 
vided into 24 hours, and for simplicity, the global activity 
inside an hour keeps unchanged. Accordingly, for each 
hour i, we denote its activity as A^. For the first day, 
the value of A for each of the 24 hours is sampled from 
a given distribution \P (A) . To account for the seasonal- 
ity, the following 9 days will repeat the activity pattern of 
the first day, that is, Xi = Xi+24- All the N individuals 
in the model have the same temporal statistics. We con- 
sider four cases: (i) Every individual follows a Poissonian 
process with rate r, that is, at each second, an arbitrary 
individual A will act with probability r A; , where i denotes 
the current hour. We assume a constant global activity, 
say Ai = A2 = • • • = A24 = A. (ii) Same to the case (i), but 
Xi (i = 1, 2, • • ■ , 24) are independently sampled from a uni- 
form distribution in the range (0, 1), say ^(A) = U(0, 1). 
(iii) Every individual acts with an endogenous power-law 
inter-event time distribution $(t) ~ t~$ . In the begin- 
ning, each individual will sample an inter-event time t 
from $(t), and will indeed act at time t/X\. Then, af- 
ter each act, the individual will resample an inter-event 
time t' from $(£) and act after t' /Xi seconds, where i de- 
notes the current hour. This rule reflects the fact that 
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Fig. 2: How the global activity Ai, quantified by the number of 
events happened during the i's hour, changes with time. The 
four plots respectively correspond to cases (i), (ii), (iii) and 
(iv). The parameters are JV = 100, r = 0.2, A = 0.5 and 
/? = 2. Strong heterogeneity and clear daily seasonality are 
observed for cases (ii) and (iv) yet only random fluctuations 
are associated with cases (i) and (iii). 



in an inactive time period, the inter-event time tends to 
be longer, and vice verse. If the time interval spans over 
more than one hour, only the activity of the starting hour 
affects the real length of the time interval. We assume 
Ai = A2 = • • • = A24 = A. (iv) Same to the case (iii), but 
Ai (i — 1, 2, • • • , 24) are independently sampled from the 
uniform distribution [7(0, 1). 

Figure 2 displays the global activity Ai {i — 
1, 2, • • • , 240) for the 240 hours, where Ai is the number of 
total events in the i's hour. For the cases (i) and (iii), the 
global activity is homogeneous, and thus the relative clock 
will not change the overall statistical regularities although 
it can to some extent reduce the fluctuation. The hetero- 
geneity of global activity is a necessary condition for the 
elimination of the heavy tail by using the relative clock, 
yet not a sufficient condition. 

Figure 3 reports the simulation results for the toy model, 
from which we conclude that: (i) As shown in Fig. 3(c), 
a power-law-like inter-event time distribution could result 
from the heterogeneity of global activitjlf] although all in- 
dividuals are the same and each individual obeys a Pois- 
sonian process in each hour. This is supportive to the 
theoretical analyses of Refs. (3UJ[2T]. In fact, endogenous 
factor, exogenous factors and the hybrid of them can gen- 
erate heavy tails in p(r), as shown in Fig. 3(d), 3(f) and 
3(h). (ii) As shown in Fig. 3(d), the inter-event time 
distribution based on the relative clock follows an expo- 
nential form, that is to say, the heavy-tail resulted from 
the heterogeneity of global activity can be effectively elim- 
inated by using the proposed timing method, (iii) Com- 

3 We introduce a periodical global activity to the model to mimic 
the seasonality observed in the real systems 10, 12.21 . However, 
the seasonality does not essentially contribute to the heavy tail in 
the inter-event time distribution. For example, if we assume A; 
(i = 1, 2, ■ • • , 240) are independently sampled from U(0, 1), then the 
seasonality will be eliminated yet the heavy tail in the distribution 
p(r) still exists. 
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Fig. 3: Comparison of inter-event time distributions p(r) based 
on the absolute and relative clocks. All the distributions pre- 
sented in this figure come from the theoretical model: case (i)- 
(a)(b), case (ii)-(c)(d), case (iii)-(e)(f), case (iv)-(g)(h). The 
left and right plots correspond to the distributions on absolute 
and relative clocks, respectively. Plots (a), (b) and (d) are of 
log- linear scale, while plots (c), (e), (f), (g) and (h) are of log- 
log scale. The parameters are iV = 100, r — 0.2, A = 0.5 and 
f3 = 2. The power-law sampling on <&(t) follows the method in 




Fig. 4: (Color online) Analytical results about the inter-event 
time distributions on relative clock. The plots (a) and (b) 
correspond to Fig. 2(b) and 2(f) respectively, with black circles 
representing the simulation results and red curves standing for 
the analytical solutions. 
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Table 2: The number of users and the number of events in the 
four real data sets. The last column gives the original places 
of the used data sets, with the last two data sets are firstly 
reported in this Letter. 



Data Sets 


#Users 


#Evcnts 


Origins 


AOL 


356610 


4596212 


m 


Delicious 


256676 


1252947 


m 


SM 


1479480 


28951117 


This Letter 


Twitter 


2711178 


9966800 


This Letter 



paring Fig. 3(f) with 3(e), as well as 3(h) with 3(g), it 
is clear that the endogenous heterogeneity, embodied in 
the power-law distribution $(f) ~ t _/3 , could not be elim- 
inated by the timing with relative clock, (iv) A peak near 
the head of p(r) will emerge in all the cases when using 
the relative clock. 

To explain the existence of a peak, we calculate the 
inter-event time distribution p(r) on relative clock. No- 
tice that, since the relative clock could eliminate the het- 
erogeneity of global activity, the idiographic form of ^(A) 
almost has nothing to do with p(r) (as an evidence, the 
distributions shown in Fig. 3(b) and Fig. 3(d) are almost 
the same, and the distributions shown in Fig. 3(f) and 
Fig. 3(h) are almost the same). Considering two indepen- 
dent stochastic processes, the actions of an individual and 
the actions of all others. Given a monitored individual i, 
we assuming that her acting frequency (i.e., the number 
of events during a unit time) is fi, and the total acting 
frequency of other individuals is fj = Yljyti fj i then the 
probability density of the inter-event time of individual i 
is: 

P(t) = fie-**. (1) 

Notice that, here we assume the individual i at most act 
once in one time step, namely fi<l and in each time step 
i will activate an event with probability In principle, 
we can assume the time resolution is elaborate enough 
and thus at each time step there is at most one event 
from all other individuals, and the happening probability 
is fj. During t time steps, the probability density of the 
cumulative number of events of all other individuals reads 



q(a) = C?f((l-f i ) t - a , (2) 
When the activity of individual can 



where Cf = ^^yy. 
be approximated as a Poisson process, we can get the prob- 
ability distribution of the inter-event time on relative clock 
through the joint probability distribution: 



p(T) 



(r-l)! 



-e-Ke-^dt. 



(3) 



Even when fr > 1, We have checked numerically that Eq. 
(3) can well reproduced the front peak in p(r). For exam- 
ple, in the case shown in Fig. 4(a), fi = 0.1 and fj — 9.9. 
Similar to the Poissonian cases, when the endogenous time 
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Fig. 5: The activity M versus time with minute resolution for 
AOL (a), Delicious (b), SM (c) and Twitter (d). The vertical 
dash lines separate 10 days. 



interval follows a power-law distribution, the probability 
distribution of the intcr-event time on relative clock is: 



p(r) 



, h (r-1)! 



(4) 



where B is the exponential power. Figure 4 reports the 
analytical solutions Eq. (3) and Eq. (4), which agree very 
well with the simulations. 

Data. — This Letter analyzes four large-scale real sys- 
tems, and for fair comparison, every data set presented 
here spans over 10 days. Followed please find the data 
description, with basic statistics shown in Table 2. (i) 
AOL.- It is previously known as America Online, which 
is a company providing Internet services and media, etc. 
This data set is about the searching behaviors of Internet 
users, with time resolution being second. The date starts 
from March 10, 2006 to March 20, 2006. The inter-event 
time is defined as the time interval between two consecu- 
tive queries by a user, (ii) Delicious - It is a web site aim- 
ing at helping users in collecting the tastiest bookmarks 
in the web. The data set contains the bookmarks add by 
users with seconds resolution, starting from September 5, 
2009, last for 10 days. Each record (i.e., event) contains 
the operation time, the users ID, the Universal Resource 
Locator (URL), and so on. The inter-event time is defined 
as the time interval between two consecutive collections of 
bookmarks by a user, (iii) SM- Short Message is proba- 
bly the most widely used electronic communication tool in 
people's daily life. This data set starts from December 10, 
2010 to December 20, 2010, with time resolution being sec- 
ond. Each record consists of three elements: a sender ID, 
a receiver ID, and the time stamp. The inter-event time 
is defined as the time intervals between two consecutive 
short messages sent by the same user, (iv) Twitter- It is 
a microblogging system in which users could upload their 
posts (i.e., microblogs) and other users, especially their 
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Fig. 6: Inter-event time distributions based on the absolute 
clock for AOL (a), Delicious (b), SM (c) and Twitter (d). These 
curves partially display a power-law-like shape, yet they can 
not be accurately fitted by simple power laws. The solid lines 
are only for eye guidance. 



followers, may comment and/or transfer these posts. The 
date starts from November 10, 2009, last for 10 days, with 
time resolution being second, recording only the upload- 
ing time of original posts. The inter-event time is defined 
as the time interval between two consecutive posts by a 
user. 

Experimental Results. — Figure 5 reports the 
global activity M(T) versus the time T, where the whole 
data is divided into 14400 segments, each of which lasts 
one minute. That is to say, M(T) is the number of event 
of the population in T's minute. It is observed that every 
system displays strong heterogeneitjQ and daily seasonal- 
itjS 

The inter-event time distributions based on absolute 
clock are shown in Fig. 6. For AOL and Delicious, ex- 
clusive of slightly drooping heads, their distributions can 
be well approximated by power laws. The distributions 
for SM and Twitter are more complicated, with only the 
middle parts following power laws. The whole distribu- 
tions cannot be accurately fitted by power laws, and the 
solid lines are only for eye guidance. In fact, we are not 
interested in whether these distributions are power-law, 
but we have noticed that the distributions typically span 
over six orders of magnitude, which is more than enough 
to demonstrate the burstiness of temporal activities. 

Compared with Fig. 3(c), 3(d) and Fig. 2, the observed 
broad inter-event time distribution may mainly result from 
the heterogeneity of activity shown in Fig. 5. If so, the dis- 



4 The typical difference of the peaked and low-lying values of M 
is about 10 2 time in AOL, Delicious and SM. This is really a huge. 
Even for Twitter, the peaked value of M can be as twice large as 
the low-lying one. 

5 Here we mainly concentrate on the daily seasonality, yet for 
longer data, we could also observe the weekly seasonality (see, e.g., 
the weekly seasonality in Netflix 110) 1 . 



Fig. 7: Inter-event time distributions based on the relative 
clock for AOL (a), Delicious (b), SM (c) and Twitter (d). Sim- 
ilar to figure 4, the solid lines are for eye guidance. 



tribution p(r) should be narrowed when using the relative 
clock. Figure 7 reports the results of p(r) by using the rel- 
ative clock. In accordance with the fourth point and Fig. 
4 obtained from the theoretical model, every distribution 
has a peak near to the head (SM and Twitter are more 
remarkable). However, different from Fig. 3(c) and 3(d), 
the heavy tails in p(r) cannot be weakened or eliminated 
by using the relative clock. These results strongly suggest 
that the observed heavy-tailed nature cannot be simply 
explained by the activity heterogeneity or seasonality. 

Discussion. — In this Letter, we proposed a new tim- 
ing method based on the so-called relative clock, where 
the time interval between two consecutive events of an in- 
dividual is quantified by the number of other individuals' 
events appeared during this interval. This method is ex- 
pected to be able to eliminate the effects of heterogeneity 
of global activity on the inter-event time distribution. The 
simulation results on the theoretical model have demon- 
strated the effectiveness of our method, and by compar- 
ing the performances of simulations with experiments, we 
conclude that the observed heavy-tailed nature in human 
online temporal activities could not be well explained by 
Poissonian agents with activity heterogeneity, no matter 
whether the seasonality gets embodied in the activity time 
series. 

Human behavior is one of the most complex and compli- 
cated things, driven by countless unknown factors. There- 
fore, given a certain statistical feature, to distinguish 
the effects from different factors is very significant. Our 
method could successfully filter out the effects caused by 
activity heterogeneity, yet it is not omnipotent. For exam- 
ple, our method may lead to bias when there exists strong 
trend of global activitjQ, and thus in these cases, detrend 



6 In online systems, usually as the number of users increases, the 
global activity will also increase, and a certain length of absolute 
time interval will thus become larger and larger on relative clock. 
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algorithms 25 27] have to be associated with our method. 
In addition, the effects of heterogeneity among individu- 
als (i.e., different individuals act with different rates) could 
not be filtered out by our method, which is also a known 
candidate that may contribute to the heavy-tailed inter- 
event distribution [jSU] . The rescaling method 12,28] ac- 
cording to the average inter-event time may be helpful in 
judging whether the active and inactive individuals act 
with essentially different pattern^- 

As a starting point of designing effective tools to distin- 
guish the effects of different factors on the statistical regu- 
larities of human dynamics, the present method is simple 
and imperfect, yet it may largely complement the cur- 
rent understanding of our behaviorial patterns. In sum- 
mary, the main contributions of this Letter are threefold. 
Firstly, by using a theoretical model, we show the heavy- 
tailed nature in population level may result from an ex- 
ogenous factor-the activity heterogeneity, and the timing 
method based on relative clock can successfully eliminate 
such exogenous effects. Secondly, extensive empirical anal- 
ysis reveals the heavy-tailed inter-event time distributions 
of typical online systems, and suggests the existence of 
endogenous mechanisms that can not be explained by the 
activity heterogeneity or seasonality versus time. Lastly, 
this Letter reports many novel empirical results to the 
scientific community, which could facilitate the studies on 
human dynamics. Although our knowledge about human 
behavior increases incessantly, it never gets sufficient. We 
believe this work has added new insights and rich empiri- 
cal materials into our knowledge. 
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